Gbdt model feature interpretation method and apparatus

ABSTRACT

Implementations of the present specification disclose methods, devices, and apparatuses for determining a feature interpretation of a predicted label value of a user generated by a GBDT model. In one aspect, the method includes separately obtaining, from each of a predetermined quantity of decision trees ranked among top decision trees, a leaf node and a score of the leaf node; determining a respective prediction path of each leaf node; obtaining, for each parent node on each prediction path, a split feature and a score of the parent node; determining, for each child node on each prediction path, a feature corresponding to the child node and a local increment of the feature on the child node; obtaining a collection of features respectively corresponding to the child nodes; and obtaining a respective measure of relevance between the feature corresponding to the at least one child node and the predicted label value.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of PCT Application No.PCT/CN2019/076264, filed on Feb. 27, 2019, which claims priority toChinese Patent Application No. 201810488062.X, filed on May 21, 2018,and each application is hereby incorporated by reference in itsentirety.

TECHNICAL FIELD

Implementations of the present specification relate to the field of dataprocessing technologies, and more specifically, to methods andapparatuses for determining feature interpretations of predicted labelsof users.

BACKGROUND

With rapid development of Internet technologies, data mining on theInternet becomes increasingly important. Generally, in data mining,modeling is performed based on labeled data though machine learning, sothat a trained model can be used to process data to be predicted. Inmany machine learning algorithms, the gradient boosting decision tree(GBDT) algorithm is more and more widely applied because of itsexcellent learning performance. The GBDT algorithm is a machine learningtechnology that is used for tasks such as regression, classification,and sorting. The GBDT algorithm combines multiple weak learners (usuallydecision trees) to obtain a strong prediction model. The GBDT model isiterated multiple times, and in each iteration, a loss function isreduced in a gradient direction to obtain multiple decision trees. Withthe extensive application of the GBDT algorithm, increasing needs forinterpreting the GBDT model are generated. In addition to featureimportance parameters that are currently commonly used as globalinterpretations, interpretation of a local feature contribution for asingle user mainly includes the following two methods: extracting apreferred solution from the GBDT model through remodeling forinterpretation; and adjusting an eigenvalue size to test impact of thefeature on a prediction performance loss.

SUMMARY

Implementations of the present specification are intended to provide amore effective GBDT model interpretation solution to reduce issues inthe existing technology.

To achieve the objective above, according to an aspect of the presentspecification, a method for determining a feature interpretation of apredicted label value of a user is provided. The method is performedafter a prediction of a label value of the user is generated by using aGBDT model. The feature interpretation includes multiple features of theuser that are relevant to the predicted label value of the user and arespective measure of relevance between each of the features and thepredicted label value. The GBDT model includes multiple decision treesarranged in a predetermined order. The method includes the following:separately obtaining, from a predetermined quantity of decision treesthat are ranked among the top decision trees in the predetermined order,a leaf node including the user and a score of the leaf node, where thescore of the leaf node is a score predetermined by using the GBDT model;determining a respective prediction path of each leaf node, where theprediction path is a node connection path from the leaf node to a rootnode of a decision tree in which the leaf node is located; obtaining asplit feature and a score of each parent node on each prediction path,where the score of the parent node is determined based on predeterminedscores of leaf nodes of a decision tree in which the parent node islocated; determining, for each child node on each prediction path basedon a score of the child node and a score and a split feature of a parentnode of the child node, a feature corresponding to the child node and alocal increment of the feature on the child node, where the featurecorresponding to the child node is a feature relevant to the predictedlabel value of the user; obtaining a collection of features respectivelycorresponding to all child nodes as the multiple features relevant tothe predicted label value of the user; and obtaining, by computing a sumof a local increment of the feature of at least one of the child nodesthat corresponds to a same feature, a measure of relevance between thefeature corresponding to the at least one child node and the predictedlabel value.

In an implementation, in the method for determining a featureinterpretation of a predicted label value of a user, the score of theparent node is determined based on the predetermined scores of the leafnodes of the decision tree in which the parent node is located. Thescore of the parent node is an average value of scores of two childnodes of the parent node.

In an implementation, in the method for determining a featureinterpretation of a predicted label value of a user, the score of theparent node is determined based on the predetermined scores of the leafnodes of the decision tree in which the parent node is located. Thescore of the parent node is a weighted average value of scores of twochild nodes of the parent node, and weights of the scores of the childnodes are determined based on quantities of samples respectivelyallocated to the child nodes in a training process of the GBDT model.

In an implementation, in the method for determining a featureinterpretation of a predicted label value of a user, the determining afeature corresponding to the child node and a local increment of thefeature on the child node includes the following: obtaining a differencebetween the score of the child node and the score of the parent node asthe local increment of the feature.

In an implementation, in the method for determining a featureinterpretation of a predicted label value of a user, the GBDT model is aclassification model or a regression model.

In an implementation, in the method for determining a featureinterpretation of a predicted label value of a user, the predeterminedquantity of decision trees that are ranked among the top decision treesin the predetermined order include multiple decision trees that areincluded in the GBDT model and that are arranged in a predeterminedorder.

According to another aspect of the present specification, an apparatusfor determining a feature interpretation of a predicted label value of auser is provided. The apparatus is implemented after a prediction of alabel value of the user is generated by using a GBDT model. The featureinterpretation includes multiple features of the user that are relevantto the predicted label value of the user and a respective measure ofrelevance between each of the features and the predicted label value.The GBDT model includes multiple decision trees arranged in apredetermined order. The apparatus includes the following: a firstacquisition unit, configured to separately obtain, from a predeterminedquantity of decision trees ranked among the top decision trees, a leafnode including the user and a score of the leaf node, where the score ofthe leaf node is a score predetermined by using the GBDT model; a firstdetermining unit, configured to determine a respective prediction pathof each leaf node, where the prediction path is a node connection pathfrom the leaf node to a root node of a decision tree in which the leafnode is located; a second acquisition unit, configured to obtain a splitfeature and a score of each parent node on each prediction path, wherethe score of the parent node is determined based on predetermined scoresof leaf nodes of a decision tree in which the parent node is located; asecond determining unit, configured to determine, for each child node oneach prediction path based on a score of the child node and a score anda split feature of a parent node of the child node, a featurecorresponding to the child node and a local increment of the feature onthe child node, where the feature corresponding to the child node is afeature relevant to the predicted label value of the user; a featureacquisition unit, configured to obtain a collection of featuresrespectively corresponding to all child nodes as the multiple featuresrelevant to the predicted label value of the user; and a relevancedetermination unit, configured to obtain, by computing a sum of a localincrement of the feature of at least one of the child nodes thatcorresponds to a same feature, a respective measure of relevance betweenthe feature corresponding to the at least one child node and thepredicted label value.

According to the GBDT model interpretation solution of theimplementations of the present specification, high quality user-levelmodel interpretation of the GBDT model can be obtained by obtainingmerely existing parameters and prediction results in the GBDT model, anda computation cost is relatively low. In addition, the solution in theimplementations of the present specification is applicable to variousGBDT models, and features high applicability and high operability.

BRIEF DESCRIPTION OF DRAWINGS

The implementations of the present specification are described withreference to the accompanying drawings so that the implementations ofthe present specification can be clearer.

FIG. 1 illustrates a method for determining a feature interpretation ofa predicted label value of a user, according to an implementation of thepresent specification;

FIG. 2 illustrates a decision tree included in a GBDT model, accordingto an implementation of the present specification;

FIG. 3 is a schematic diagram illustrating implementation of a methodaccording to an implementation of the present specification based on thedecision tree illustrated in FIG. 2; and

FIG. 4 illustrates an apparatus 400 for obtaining a featureinterpretation of a predicted label value of a user, according to animplementation of the present specification.

DESCRIPTION OF IMPLEMENTATIONS

The following describes the implementations of the present specificationwith reference to the accompanying drawings.

An application scenario of the implementations of the presentspecification is first described. A model interpretation methodaccording to the implementations of the present specification isperformed after a prediction of a label value of a user is generated byusing a GBDT model. The GBDT model is obtained by training through thefollowing training process: A training set D1={x^((i)), y^((i))}_(i=1)^(N) is first obtained, where N is a quantity of training samples, thatis, a quantity of users. x^((i)) is a feature vector of the ith user,and the feature vector is, for example, an S-dimensional vector, thatis, x=(x₁,x₂, . . . , x_(s)). y^((i)) is a known label value of the ithuser. For example, the GBDT model is a model for predicting credit cardfrauds. Therefore, x^((i)) can be a user's credit card record data,transaction record data, etc., and y^((i)) can be the user's fraud riskvalue. Then, the N users can be divided through a first decision tree.To be specific, a split feature and a feature threshold are set on eachparent node of the decision tree, and features and feature thresholdscorresponding to users are compared on the parent node so as to divideand allocate the users to corresponding child nodes. Through such aprocess, the N users can finally be divided and allocated to leaf nodes.A score of each leaf node is an average value of label values (namely,y^((i))) of users on the leaf node.

After the first decision tree is obtained, a residual r^((i)) of eachuser can be obtained by subtracting a known label value of the user froma score of the user's leaf node in the first decision tree. D2={x^((i)),r^((i))}_(i=1) ^(N) is used as a new training set, which corresponds toa same user set as D1. A second decision tree can be obtained in thesame way as above. In the second decision tree, the N users are dividedand allocated to leaf nodes, and a score of each leaf node is an averagevalue of residual values of users. Similarly, multiple decision treescan be obtained in a predetermined order, and each decision tree isobtained based on residuals of the previous decision tree. Therefore, aGBDT model including multiple decision trees can be obtained.

During prediction of a user's label value, a feature vector of the useris input to the GBDT model, and each decision tree in the GBDT modelallocates the user to a corresponding leaf node based on a split featureand a split threshold of a parent node in the decision tree, so as toobtain a predicted label value of the user by computing a sum of scoresof leaf nodes in which the user is located.

After the previous prediction process, a feature interpretation of thepredicted label value of the user can be obtained based on existingparameters and a prediction result in the GBDT model according to themodel interpretation method in the implementations of the presentspecification. That is, from the decision trees, a leaf node in whichthe user is located is obtained, a prediction path including the leafnode is obtained; a feature, of a child node on the prediction path,relevant to the predicted label value and a local increment of thefeature are calculated; and local increments of a same feature that areincluded in all the decision trees are accumulated as a measure ofrelevance between the feature and the predicted label value, that is, afeature contribution of the feature to the predicted label value. Assuch, feature interpretation is performed on the predicted label valueof the user by using the feature and its feature contribution. Theprevious GBDT model is a regression model, that is, a label predicted bythe GBDT model is a continuous value, for example, a fraud risk value,an age, etc. However, the GBDT model is not limited to a regressionmodel, and can further be a classification model, a recommendationmodel, etc. These models each can use the GBDT model interpretationmethod according to the implementations of the present specification.

FIG. 1 illustrates a method for determining a feature interpretation ofa predicted label value of a user, according to an implementation of thepresent specification. The method is performed after a prediction of alabel value of the user is generated by using a GBDT model. The featureinterpretation includes multiple features of the user that are relevantto the predicted label value of the user and a respective measure ofrelevance between each of the features and the predicted label value.The GBDT model includes multiple decision trees arranged in apredetermined order. The method includes the following steps: At stepS11, from a predetermined quantity of decision trees ranked among thetop decision trees, a leaf node including the user and a score of theleaf node are separately obtained, where the score of the leaf node is ascore predetermined by using the GBDT model. At step S12, a respectiveprediction path of each leaf node is determined, where the predictionpath is a node connection path from the leaf node to a root node of adecision tree in which the leaf node is located. At step S13, a splitfeature and a score of each parent node on each prediction path areobtained, where the score of the parent node is determined based onpredetermined scores of leaf nodes of a decision tree in which theparent node is located. At step S14, for each child node on eachprediction path, a feature corresponding to the child node and a localincrement of the feature on the child node are determined based on ascore of the child node and a score and a split feature of a parent nodeof the child node, where the feature corresponding to the child node isa feature relevant to the predicted label value of the user. At stepS15, a collection of features respectively corresponding to all childnodes is obtained as the multiple features relevant to the predictedlabel value of the user. At step S16, by computing a sum of a localincrement of the feature of at least one of the child nodes thatcorresponds to a same feature, a respective measure of relevance betweenthe feature corresponding to the at least one child node and thepredicted label value is obtained.

First, at step S11, from the predetermined quantity of decision treesranked among the top decision trees, the leaf node including the userand the score of the leaf node are separately obtained, where the scoreof the leaf node is a score predetermined by using the GBDT model.

As described above, in the multiple decision trees that are included inthe GBDT model, each decision tree is obtained based on a label valueresidual of its previous decision tree, that is, a score of a leaf nodeof each of the decision trees becomes increasingly small. Accordingly, alocal increment of a user feature that is relevant to the user'spredicted label value and that is determined by using each of thedecision trees arranged in the predetermined order also becomes ordersof magnitude smaller. It can be predicted that, a local increment of afeature obtained from a decision tree ranked behind has increasinglysmall impact on a measure of relevance of the feature to the predictedlabel value (that is, a sum of all local increments of the feature), andthe impact can even be approximately zero. Therefore, the predeterminedquantity of decision trees ranked among the top decision trees can beselected to implement the method according to the implementations of thepresent specification. The predetermined quantity can be determined byusing a predetermined condition. For example, the predetermined quantitycan be determined based on an order of magnitude of leaf nodes, or basedon a predetermined decision tree percentage. In an implementation, themethod according to the implementations of the present specification canbe implemented for all the decision trees that are included in the GBDTmodel to obtain accurate model interpretations.

FIG. 2 illustrates a decision tree that is included in the GBDT model,according to an implementation of the present specification. As shown inFIG. 2, a node denoted by 0 in the figure is a root node of the decisiontree, and nodes denoted by 3, 7, 8, 13, 14, 10, 11, and 12 in the figureare leaf nodes of the decision tree. A value depicted under each leafnode (for example, 0.136 under node 3) is a score of the leaf node, andthe score is determined by the GBDT model in training based on knownlabel values of multiple samples allocated to the leaf nodes. As shownin the rectangular dashed-line box in FIG. 2, two nodes 11 and 12 aredivided from node 6. Therefore, node 6 is a parent node of node 11 andnode 12, and both node 11 and node 12 are child nodes of node 6. Asshown in FIG. 2, features and value ranges are depicted over arrows fromsome parent nodes to child nodes in the figure. For example, “f5≤−0.5”is depicted over an arrow from node 0 to node 1, and “f5>−0.5” isdepicted over an arrow from node 0 to node 2. Here, f5 representsfeature 5, which is a split feature of node 0, and −0.5 is a splitthreshold of node 0.

FIG. 3 is a schematic diagram illustrating implementation of a method inan implementation of the present specification based on the decisiontree illustrated in FIG. 2. As shown in FIG. 3, when a prediction of auser's label value is generated by using a GBDT model including thedecision tree shown in FIG. 3, assuming that the user is allocated tonode 14 in the decision tree, node 14 including the user and a score ofnode 14 can be determined from the decision tree. In addition, inanother decision tree that is included in the GBDT model, a leaf node inwhich the user is located and a score of the leaf node can be similarlydetermined. Therefore, a predetermined quantity of leaf nodes and theircorresponding scores can be obtained, that is, one leaf node can beobtained from each of the predetermined quantity of decision trees.

At step S12, a respective prediction path of each leaf node isdetermined, where the prediction path is a node connection path from theleaf node to the root node of the decision tree in which the leaf nodeis located. Referring back to FIG. 3, in the decision tree shown in FIG.3, after leaf node 14 in which the user is located is determined, it canbe determined that the prediction path is a prediction path from node 0to node 14 in the figure, and the prediction path is indicated by a nodeconnection path connected by using bold arrows in the figure. Inaddition, in another decision tree in the predetermined quantity ofdecision trees, a prediction path can be similarly obtained, so as toobtain a predetermined quantity of prediction paths.

At step S13, the split feature and the score of each parent node on eachprediction path are obtained, where the score of the parent node isdetermined based on the predetermined scores of the leaf nodes of thedecision tree in which the parent node is located. Referring to FIG. 3,on the prediction path from node 0 to node 14, each node except node 14has a child node, that is, parent nodes on this path include node 0,node 2, node 5, and node 9. As described above with reference to FIG. 2,a split feature of a parent node can be directly obtained from adecision tree. For example, referring to FIG. 2, it can be determinedthat a split feature of node 0 is feature 5, a split feature of node 2is feature 2, a split feature of node 5 is feature 4, and a splitfeature of node 9 is feature 4. In an implementation, the score of theparent node can be determined based on the following equation (1):

S _(p)=1/2(S _(c1) +S _(c2))  (1)

S_(p) is the score of the parent node, and S_(c1) and S_(c2) are scoresof two child nodes of the parent node. That is, the score of the parentnode is an average value of the scores of its two child nodes. Forexample, as shown in FIG. 3, based on scores of node 13 and node 14, itcan be determined that a score of node 9 is ½(0.06+0.062)=0.061.Similarly, based on scores of node 9 and node 10, it can be determinedthat a score of node 5 is 0.0625; based on score of node 5 and node 6,it can be determined that a score of node 2 is 0.0698; and based onscore of node 1 and node 2, it can be determined that a score of node 0is 0.0899. It can be understood that, a score of each parent node on theprediction path shown in FIG. 3 can be determined based on scores ofleaf nodes in the figure. For example, the score of node 5 can bedetermined based on scores of nodes 13, 14, and 10, and the score ofnode 2 can be determined based on scores of nodes 13, 14, 10, 11, and12.

In an implementation, the score of the parent node can be determinedbased on the following equation (2):

$\begin{matrix}{S_{p} = \frac{{N_{C\; 1} \times S_{C\; 1}} + {N_{C\; 2} \times S_{C\; 2}}}{N_{C\; 1} + N_{C\; 2}}} & (2)\end{matrix}$

N_(c1) and N_(c2) are quantities of samples that are respectivelyallocated to child nodes c1 and c2 in model training. That is, the scoreof the parent node is a weighted average value of scores of two childnodes of the parent node, and weights of the two child nodes arequantities of samples that are allocated to the two child nodes in themodel training process. In actual applications or experimental testsaccording to the implementations of the present specification, it can bedetermined that a higher quality model interpretation can be obtained byusing the score of the parent node determined by using equation (2) thanthe score of the parent node determined by using equation (1). Inaddition, in the implementations of the present specification,calculation of the parent node is not limited to previous equations (1)and (2). For example, parameters in equations (1) and (2) can beadjusted, so that a model interpretation can be more accurate. Moreover,the score of each parent node can be obtained based on scores of leafnodes by using a geometric average value, a root mean square value, etc.

At step S14, for each child node on each prediction path, the featurecorresponding to the child node and the local increment of the featureon the child node are determined based on the score of the child nodeand the score and the split feature of the parent node of the childnode, where the feature corresponding to the child node is a featurerelevant to the predicted label value of the user.

Referring to FIG. 3, on the prediction path from node 0 to node 14, allnodes except root node 0 are child nodes of previous nodes, that is,child nodes on the path include node 2, node 5, node 9, and node 14. Achild node on the prediction path can be obtained only through featuresplitting on a parent node on the prediction path. Therefore, a splitfeature of the parent node is a feature relevant to a predicted labelvalue of the child node. For ease of description, the split feature isdescribed as a feature corresponding to the child node or a contributingfeature on the child node. For example, as shown in FIG. 3, a featurecorresponding to node 2 is feature 5, a feature corresponding to node 5is feature 2, a feature corresponding to node 9 is feature 4, and afeature corresponding to node 14 is feature 4.

In an implementation, a local increment of the feature on each childnode can be obtained by using the following equation (3):

LI _(c) ^(f) =S _(c) −S _(p)  (3)

LI_(c) ^(f) represents a local increment of feature f on child node c,S_(c) represents a score of the child node, and S_(p) represents a scoreof a parent node of the child node. The equation can be verified fromactual applications or experimental tests.

By using equation (3), based on the score of each parent node obtainedin step S13, it can be easily obtained through calculation that a localincrement of feature 5 (f5) on node 2 is −0.0201 (that is,0.0698−0.0899), a local increment of feature 2 (f2) on node 5 is−0.0073, a local increment of feature 4 (f4) on node 9 is −0.0015, and alocal increment of feature 4 (f4) on node 14 is 0.001.

In the implementations of the present specification, calculation of thelocal increment is not limited to previous equation (3), and the localincrement can be further calculated by using another calculation method.For example, the score of the parent node or the score of the child nodein equation (3) can be multiplied by a correction parameter to make themodel interpretation more accurate.

At step S15, the collection of features respectively corresponding toall the child nodes is obtained as the multiple features relevant to thepredicted label value of the user. For example, referring to FIG. 3, inthe decision tree shown in FIG. 3, features relevant to the predictedlabel value of the user, namely, feature 5, feature 2, and feature 4,can be obtained from the prediction path. Similarly, features relevantto the predicted label value of the user can be similarly obtained fromother decision trees in the predetermined quantity of decision trees.These features are grouped together to obtain a collection of multiplefeatures relevant to the predicted label value of the user.

At step S16, by computing a sum of a local increment of the feature ofat least one of the child nodes that corresponds to a same feature, arespective measure of relevance between the feature corresponding to theat least one child node and the predicted label value is obtained. Forexample, in the decision tree shown in FIG. 3, nodes 9 and 14 on theprediction path both correspond to feature 4, and therefore, a sum ofthe local increments on nodes 9 and 14 can be computed. For example,when no prediction path child node corresponding to feature 4 can beobtained in the other decision trees, a measure of relevance (or afeature contribution value) between feature 4 and the predicted labelvalue can be obtained, which is −0.0015+0.0010=0.0025. When anotherdecision tree also includes a prediction path child node correspondingto feature 4, a sum of local increments of all child nodes correspondingto feature 4 can be computed, so as to obtain a measure of relevance ora contribution value of feature 4. A larger relevance value indicates ahigher measure of relevance between the feature and the predicted labelvalue. When a relevance value is negative, it indicates that the measureof relevance between the feature and the predicted label value is verylow. For example, in an instance of generating a prediction of a creditcard fraud value of a user by using a GBDT model, a larger relevancevalue indicates a higher measure of relevance between the feature andthe credit card fraud value, that is, the feature has a greater risk.

By obtaining multiple features relevant to the predicted label value ofthe user and the respective measures of relevance between the multiplefeatures and the predicted label value, feature interpretation can beperformed on the predicted label value of the user, so as to clarify aprediction determining factor. In addition, more information related tothe user can be obtained through the feature interpretation. Forexample, in an instance of generating a prediction of a credit cardfraud value of a user by using a GBDT model, by obtaining multiplefeatures relevant to a predicted label value of the user and respectiverelevance values of the features, an impact aspect of the feature and arelevance value of the feature can be used as reference information fora predicted credit card fraud value of the user, so that judgment of theuser is more accurate.

FIG. 4 illustrates an apparatus 400 for obtaining a featureinterpretation of a predicted label value of a user, according to animplementation of the present specification. The apparatus 400 isimplemented after a prediction of a label value of the user is generatedby using a GBDT model. The feature interpretation includes multiplefeatures of the user that are relevant to the predicted label value ofthe user and a respective measure of relevance between each of thefeatures and the predicted label value. The GBDT model includes multipledecision trees arranged in a predetermined order. The apparatus 400includes the following: a first acquisition unit 41, configured toseparately obtain, from a predetermined quantity of decision treesranked among the top decision trees, a leaf node including the user anda score of the leaf node, where the score of the leaf node is a scorepredetermined by using the GBDT model; a first determining unit 42,configured to determine a respective prediction path of each leaf node,where the prediction path is a node connection path from the leaf nodeto a root node of a decision tree in which the leaf node is located; asecond acquisition unit 43, configured to obtain a split feature and ascore of each parent node on each prediction path, where the score ofthe parent node is determined based on predetermined scores of leafnodes of a decision tree in which the parent node is located; a seconddetermining unit 44, configured to determine, for each child node oneach prediction path based on a score of the child node and a score anda split feature of a parent node of the child node, a featurecorresponding to the child node and a local increment of the feature onthe child node, where the feature corresponding to the child node is afeature relevant to the predicted label value of the user; a featureacquisition unit 45, configured to obtain a collection of featuresrespectively corresponding to all child nodes as the multiple featuresrelevant to the predicted label value of the user; and a relevancedetermination unit 46, configured to obtain, by computing a sum of alocal increment of the feature of at least one of the child nodes thatcorresponds to a same feature, a respective measure of relevance betweenthe feature corresponding to the at least one child node and thepredicted label value.

According to the GBDT model interpretation solution of theimplementations of the present specification, a high quality user-levelmodel interpretation of the GBDT model can be obtained by obtainingmerely existing parameters and prediction results in the GBDT model, anda computation cost is relatively low. In addition, the solution in theimplementations of the present specification is applicable to variousGBDT models, and features high applicability and high operability.

A person of ordinary skill in the art can be further aware that, incombination with the examples described in the implementations disclosedin the present specification, units and algorithm steps can beimplemented by electronic hardware, computer software, or a combinationthereof. To clearly describe interchangeability between the hardware andthe software, compositions and steps of each example are generallydescribed above based on functions. Whether the functions are performedby hardware or software depends on particular applications and designconstraint conditions of the technical solutions. A person of ordinaryskill in the art can use different methods to implement the describedfunctions for each particular application, but it should not beconsidered that the implementation goes beyond the scope of the presentapplication.

Steps of methods or algorithms described in the implementationsdisclosed in the present specification can be implemented by hardware, asoftware module executed by a processor, or a combination thereof. Thesoftware module can reside in a random access memory (RAM), a memory, aread-only memory (ROM), an electrically programmable ROM, anelectrically erasable programmable ROM, a register, a hard disk, aremovable disk, a CD-ROM, or any other form of storage medium known inthe art.

In the described specific implementation methods, the objective,technical solutions, and benefits of the present disclosure are furtherdescribed in detail. It should be understood that the descriptions aremerely specific implementation methods of the present disclosure, butare not intended to limit the protection scope of the presentdisclosure. Any modification, equivalent replacement, or improvementmade without departing from the spirit and principle of the presentdisclosure should fall within the protection scope of the presentdisclosure.

What is claimed is:
 1. A computer-implemented method comprising:generating a predicted label value of a user using a gradient boostingdecision tree GBDT model, the GBDT model comprising multiple decisiontrees arranged in a predetermined order; separately obtaining, from eachof a predetermined quantity of decision trees ranked among top decisiontrees, (i) a leaf node to which the user is assigned and (ii) a score ofthe leaf node that is determined by using the GBDT model; determining arespective prediction path of each leaf node, wherein the predictionpath is a path from the leaf node to a root node of a decision tree inwhich the leaf node is located; obtaining, for each parent node on eachprediction path, a split feature and a score of the parent node, whereinthe score of the parent node is determined based on respective scores ofchild nodes of a decision tree in which the parent node is located;determining, for each child node on each prediction path and based on(i) a score of the child node, (ii) a score of the parent node, and(iii) a split feature of the parent node, a feature corresponding to thechild node and a local increment of the feature on the child node,wherein the feature corresponding to the child node is a featurerelevant to the predicted label value of the user; obtaining acollection of features respectively corresponding to the child nodes asthe multiple features relevant to the predicted label value of the user;and obtaining, by computing a sum of a local increment of the feature ofat least one of the child nodes that corresponds to a same feature, arespective measure of relevance between the feature corresponding to theat least one child node and the predicted label value.
 2. Thecomputer-implemented method of claim 1, wherein determining the score ofthe parent node based on the respective scores of the child nodes of thedecision tree in which the parent node is located comprises: determiningan average value of the scores of two child nodes of the parent node. 3.The computer-implemented method of claim 1, wherein determining thescore of the parent node based on the respective scores of the childnodes of the decision tree in which the parent node is locatedcomprises: determining respective weights of the scores of the childnodes based on respective quantities of training samples allocated tothe child nodes during a training process of the GBDT model; anddetermining the score of the parent node as a weighted average value ofthe scores of two child nodes of the parent node.
 4. Thecomputer-implemented method of claim 1, wherein the determining afeature corresponding to the child node and a local increment of thefeature on the child node comprises: determining a difference betweenthe score of the child node and the score of the parent node; and usingthe difference as the local increment of the feature.
 5. Thecomputer-implemented method of claim 1, wherein the GBDT model is aclassification model or a regression model.
 6. The computer-implementedmethod of claim 1, wherein the predetermined quantity of decision treesranked among top decision trees comprise a plurality of decision treesthat are included in the GBDT model and that are arranged in apredetermined order.
 7. A non-transitory, computer-readable mediumstoring one or more instructions executable by a computer system toperform one or more operations comprising: generating a predicted labelvalue of a user using a gradient boosting decision tree GBDT model, theGBDT model comprising multiple decision trees arranged in apredetermined order; separately obtaining, from each of a predeterminedquantity of decision trees ranked among top decision trees, (i) a leafnode to which the user is assigned and (ii) a score of the leaf nodethat is determined by using the GBDT model; determining a respectiveprediction path of each leaf node, wherein the prediction path is a pathfrom the leaf node to a root node of a decision tree in which the leafnode is located; obtaining, for each parent node on each predictionpath, a split feature and a score of the parent node, wherein the scoreof the parent node is determined based on respective scores of childnodes of a decision tree in which the parent node is located;determining, for each child node on each prediction path and based on(i) a score of the child node, (ii) a score of the parent node, and(iii) a split feature of the parent node, a feature corresponding to thechild node and a local increment of the feature on the child node,wherein the feature corresponding to the child node is a featurerelevant to the predicted label value of the user; obtaining acollection of features respectively corresponding to the child nodes asthe multiple features relevant to the predicted label value of the user;and obtaining, by computing a sum of a local increment of the feature ofat least one of the child nodes that corresponds to a same feature, arespective measure of relevance between the feature corresponding to theat least one child node and the predicted label value.
 8. Thenon-transitory, computer-readable medium of claim 7, wherein determiningthe score of the parent node based on the respective scores of the childnodes of the decision tree in which the parent node is locatedcomprises: determining an average value of the scores of two child nodesof the parent node.
 9. The non-transitory, computer-readable medium ofclaim 7, wherein determining the score of the parent node based on therespective scores of the child nodes of the decision tree in which theparent node is located comprises: determining respective weights of thescores of the child nodes based on respective quantities of trainingsamples allocated to the child nodes during a training process of theGBDT model; and determining the score of the parent node as a weightedaverage value of the scores of two child nodes of the parent node. 10.The non-transitory, computer-readable medium of claim 7, wherein thedetermining a feature corresponding to the child node and a localincrement of the feature on the child node comprises: determining adifference between the score of the child node and the score of theparent node; and using the difference as the local increment of thefeature.
 11. The non-transitory, computer-readable medium of claim 7,wherein the GBDT model is a classification model or a regression model.12. The non-transitory, computer-readable medium of claim 7, wherein thepredetermined quantity of decision trees ranked among top decision treescomprise a plurality of decision trees that are included in the GBDTmodel and that are arranged in a predetermined order.
 13. Acomputer-implemented system, comprising: one or more computers; and oneor more computer memory devices interoperably coupled with the one ormore computers and having tangible, non-transitory, machine-readablemedia storing one or more instructions that, when executed by the one ormore computers, perform one or more operations comprising: generating apredicted label value of a user using a gradient boosting decision treeGBDT model, the GBDT model comprising multiple decision trees arrangedin a predetermined order; separately obtaining, from each of apredetermined quantity of decision trees ranked among top decisiontrees, (i) a leaf node to which the user is assigned and (ii) a score ofthe leaf node that is determined by using the GBDT model; determining arespective prediction path of each leaf node, wherein the predictionpath is a path from the leaf node to a root node of a decision tree inwhich the leaf node is located; obtaining, for each parent node on eachprediction path, a split feature and a score of the parent node, whereinthe score of the parent node is determined based on respective scores ofchild nodes of a decision tree in which the parent node is located;determining, for each child node on each prediction path and based on(i) a score of the child node, (ii) a score of the parent node, and(iii) a split feature of the parent node, a feature corresponding to thechild node and a local increment of the feature on the child node,wherein the feature corresponding to the child node is a featurerelevant to the predicted label value of the user; obtaining acollection of features respectively corresponding to the child nodes asthe multiple features relevant to the predicted label value of the user;and obtaining, by computing a sum of a local increment of the feature ofat least one of the child nodes that corresponds to a same feature, arespective measure of relevance between the feature corresponding to theat least one child node and the predicted label value.
 14. Thecomputer-implemented system of claim 13, wherein determining the scoreof the parent node based on the respective scores of the child nodes ofthe decision tree in which the parent node is located comprises:determining an average value of the scores of two child nodes of theparent node.
 15. The computer-implemented system of claim 13, whereindetermining the score of the parent node based on the respective scoresof the child nodes of the decision tree in which the parent node islocated comprises: determining respective weights of the scores of thechild nodes based on respective quantities of training samples allocatedto the child nodes during a training process of the GBDT model; anddetermining the score of the parent node as a weighted average value ofthe scores of two child nodes of the parent node.
 16. Thecomputer-implemented system of claim 13, wherein the determining afeature corresponding to the child node and a local increment of thefeature on the child node comprises: determining a difference betweenthe score of the child node and the score of the parent node; and usingthe difference as the local increment of the feature.
 17. Thecomputer-implemented system of claim 13, wherein the GBDT model is aclassification model or a regression model.
 18. The computer-implementedsystem of claim 13, wherein the predetermined quantity of decision treesranked among top decision trees comprise a plurality of decision treesthat are included in the GBDT model and that are arranged in apredetermined order.